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Abstract 

A Leakage-Biased Domino circuit family is proposed that main- 
tains high speed in active mode but which can be rapidly placed 
into a low-leakage inactive state by using leakage currents them- 
selves to bias internal nodes. A 32-bit Han-Carlson domino adder 
circuit is used to compare LB-Domino with conventional single and 
dual Vt domino circuits. For equal delay and noise margin, the 
LB-Domino technique gives two decades reduction in steady-state 
leakage energy compared to a dual-Vt technique. 

Introduction 

Energy dissipation has emerged as the primary design constraint 
for many systems, from portable electronics to high-performance 
microprocessors. Until recently, the dominant cause of energy dis- 
sipation in digital CMOS has been dynamic switching of load ca- 
pacitances. Continuing reductions in feature size reduce capaci- 
tance and supply voltage and hence dynamic switching energy per 
operation but, to maintain performance, threshold voltages must 
also be scaled down with supply voltage. Unfortunately, lowering 
the threshold voltage increases static leakage current exponentially, 
and within a few process generations it is predicted energy dissipa- 
tion from static leakage current could be comparable to dynamic 
switching energy [3, 4], 

A number of techniques have been proposed to combat this in- 
crease in leakage power. These approaches can be divided into two 
categories. The first category focuses on the static design-time se- 
lection of slow transistors on non-critical paths. These techniques 
include: conventional transistor sizing, lower Vdd [8, 10], stacked 
gates [14, 24, 21], longer channels [7], higher threshold voltages 
[20, 9, 19, 23, 1], and thicker T ox ; we collectively refer to these 
as statically-selected slow transistors (SSSTs). Once SSST tech- 
niques have been applied, most leakage current is concentrated on 
critical paths. For example, in a recent embedded PowerPC 750 
design, the lowest threshold transistors accounted for only 5% of 
the total transistor width but around 50% of the total leakage [11]. 

Critical path transistors cannot be permanently slowed down to 
reduce leakage without affecting circuit performance. The second 
category of leakage reduction techniques dynamically switch the 
fast transistors into a low-leakage state during idle periods. Tech- 
niques to deactivate fast transistors dynamically include body bi- 
asing, sleep transistors, and sleep vectors; we collectively refer to 
these as dynamically-deactivated fast transistors (DDFTs). Body- 
biasing [13, 17, 18, 6, 5] can reduce leakage to low levels, but 
incurs a large energy overhead to charge highly-capacitative wells 
and takes significant time to apply, and so can only be profitably 
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applied for long idle times. Sleep transistors [13, 15, 23, 22, 16] 
placed in series with the power supply can reduce leakage to low 
levels but these impact circuit speed and add to circuit area. In 
addition, switching the sleep transistors can also have large energy 
overheads due to charging and discharging of the virtual power sup- 
ply nodes. Sleep vector techniques [24, 21] drive input vectors into 
a circuit such that leakage currents are minimized. These tech- 
niques are fast, but it is difficult to find a good sleep vector that will 
propagate a low-leakage state throughout a circuit. Adding sleep 
vector circuitry to force intermediate nodes to the desired values 
can increase circuit delay and transition energy overhead. 

Domino logic is often used on critical paths, and several DDFT 
techniques have been proposed to reduce leakage on idle domino 
blocks. Dual-Vt domino [23] requires additional input gating to 
force the internal nodes into a sleep state which reduces perfor- 
mance and increases active energy. Also, as shown below, the high- 
Vt keepers increase active energy once noise margin is equalized 
[23]. MHS-Domino [1] modifies a clock-delayed keeper circuit 
to force internal dynamic nodes into a low leakage state. How- 
ever, the internal node is pulled down through a PMOS leaving the 
possibility of an intermediate voltage on the dynamic node of the 
first stage of a domino chain if the data inputs are not high. This 
can cause short-circuit current in the static output inverter until the 
leakage through the input transistors finally pulls the dynamic node 
to ground. 

This paper presents a new DDFT circuit family, Leakage-Biased 
Domino (LB-Domino). LB-Domino uses sleep transistors only on 
non-critical paths and uses the leakage current itself to bias inter- 
nal critical paths into a minimal leakage state — leakage currents 
are used to apply the optimal sleep vector. This technique has lit- 
tle impact on active energy or delay when applied to conventional 
domino circuitry. LB-Domino provides a low-leakage state which 
can be rapidly entered and exited with low transition energy over- 
head. This enables fine-grain leakage reduction, where small sub- 
circuits can be deactivated for short periods of time. 

Leakage-Biased Domino 

An LB-Domino buffer is shown in Figure 1. This example is a 
footless domino buffer without a clock transistor in the dynamic 
pull-down stack, but the LB technique can also be applied to footed 
domino stacks. Only two small sleep transistors are added to a con- 
ventional CMOS domino gate: a high-Vt PMOS in series with the 
keeper power supply and a high-Vt NMOS in series with the static 
output logic pulldown. When the sleep signal is deasserted, the cir- 
cuit operates as a conventional domino gate with minimal perfor- 
mance degradation because there are no additional series transistors 
in the critical evaluate path. 

To place the circuit into sleep mode, the clock signal is left high 
after an evaluate cycle and the sleep signal is asserted (sleep=l 
and sleepb=0). If the data input was high, nodel would have 
been discharged. If the data input was low, nodel is high but the 
leakage through the NMOS dynamic pull-down stack will slowly 
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Figure 1: A leakage-biased domino buffer. 

discharge the node to ground (the precharge and keeper pull-up 
transistors are high-Vt devices with significantly lower leakage 
than the pull-down stack). The NMOS sleep transistor is added 
to prevent any short-circuit current in the static output logic while 
the dynamic node discharges to ground. The static output, node2, 
will rise as the static pull-up turns on. As the leakage current of 
one domino gate causes its output node to rise, this will cause the 
NMOS transistors in the pulldown stacks of the following domino 
gates to turn on, accelerating the discharge of their internal dynamic 
nodes. In this way, LB-Domino gates bias themselves into a low- 
leakage state where the internal dynamic nodes are discharged low 
and static nodes are charged high regardless of input vector state. 

When the internal dynamic node is discharged, the main leakage 
is across the high-Vt PMOS precharge transistor which is turned 
off by the clock signal remaining high. The leakage path of the 
static output includes at least two series NMOS transistors, one of 
which is a high-Vt device. A conventional precharge cycle is used 
to move from sleep mode back to active mode. 

Compared with MHS-Domino, LB-Domino has a simpler sleep 
mechanism that is compatible with, but does not require, a clock- 
delayed keeper. LB-Domino also avoids short-circuit current in the 
static output inverter of the first gate of a domino chain. 

Evaluation Methodology 

The carry generation circuit of a 32-bit Han-Carlson adder [12] was 
used to evaluate LB-Domino. The carry generation circuit is pure 
domino with six levels of alternating dynamic and static logic. The 
basic propagate-generate cells are shown in Figure 2. Four variants 
of the design were compared. The first uses only low-Vt transis- 
tors (LVT), while the second is a dual-Vt (DVT) design where only 
evaluation phase transistors are low-Vt. The third variant is an LB- 
Domino (LB) design based on the DVT design but with high-Vt 
sleep transistors added to the keeper feedback circuits and the static 
logic pulldowns. The fourth variant (LB2) is another LB-Domino 
design which only uses high-Vt for the precharge transistors and 
for the added sleep transistors. 

For all four designs, the input and output noise margin of all 
dynamic circuits was set to 10% of the supply voltage and the 
precharge/evaluation delays were equalized to within 1% error 
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Process 


180 nm 


70 nm 


High Vt (NMOS/PMOS) 


0.46V/-0.45V 


0.39V/-0.40V 


Low Vt (NMOS/PMOS) 


0.27V/-0.23V 


0.15V/-0.18V 


Vdd 


1.8V 


0.9V 


Temperature 


100 °C 


100 °C 





a 


b 


ci 


Vector 1 


0x00000000 


0x00000000 





Vector 2 


Oxffffffff 


0x00000000 





Vector 3 


Oxffffffff 


Oxffffffff 


1 



through transistor sizing. The circuits were designed for an ex- 
isting TSMC 180nm process and a projected 70 nm process ob- 
tained from the BPTM project [2] (Table 1). All simulations used 
HSPICE. 

Since both active energy and leakage power are dependent upon 
inputs, three different input vectors were considered (Table 2): 
vecl doesn't discharge any dynamic nodes, vec3 discharges all 
dynamic nodes, and vec2 discharges half and leaves half high. 



Results 

Figures 3 and 4 show the delay and active energy consumption 
for 180nm and 70 nm processes respectively. The active energy 
of DVT is greater than that of LVT because the high-Vt keeper 
transistors must be sized up to give equal noise margin and equal 
precharge delay. For the same reason, the active energy of LB is 
greater than that of DVT. However, LB2 can meet the delay con- 
straints with only a small increase in active energy over the LVT 
design because it uses only a small number of high-Vt transistors. 
Figures 5 and 6 show the steady-state leakage power for 180nm 
and 70 nm processes respectively. The leakage power of DVT is 
very sensitive to input values. For vec3 with olk=l, all the high- 
Vt transistors in DVT are turned off and the lowest leakage power 
is obtained. On the other hand, for vecl with clk=l, the leakage 
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Figure 2: Cells for a 32-bit Han-Carlson adder. Low-Vt transistors 
are shaded. 



is comparable to the LVT design. The sleep-state leakage power 
of LB and LB2 is independent of input vector because leakage cur- 
rents bias the internal nodes into the lowest leakage state over some 
transition time. The LB schemes have worst-case sleep-state leak- 
age currents that are around two decades lower than the LVT and 
DVT designs. For the 180 nm process, the LB scheme is preferred 
for circuits that spend enough time in sleep mode as it has lower 
leakage than LB2, but for circuits that are more active, LB2 has 
lower active energy and reasonable steady-state leakage. For the 
70 nm process, LB2 is always better than LB since it has lower 
active energy and lower steady-state leakage than LB. 

Figures 7 and 8 show how energy consumption evolves over time 
when the circuit is put into a sleep state for 180 nm and 70 nm pro- 
cesses respectively. The energy curves show the energy consump- 
tion when the circuit sleeps for the specified time, including the 
cost to transition the circuit into and out of the sleep state (e.g., 
the energy to switch the gates of the sleep transistors). For LVT 
and DVT schemes, the sleep energy is just linearly proportional to 
sleep time as leakage currents are constant. The sleep energy curve 
of LB shows a very different characteristic. There is a large jump in 
energy after a short sleep time (around 20 ns for 180 nm and around 
1 ns for 70 nm). At this point, the static output of the first domino 
stage charges up to the threshold voltage, and causes the following 
stage to move rapidly to the low-leakage state. This process quickly 
ripples through the chain of domino gates. The energy stored in 
any precharged dynamic nodes is lost and must be restored during 
precharge when the circuit is next woken up, hence the steep rise 
in effective sleep energy dissipation. After this point, the energy 
curve has a very shallow slope due to the lowered leakage currents. 

For short sleep times, the LB schemes require more total energy 
than simply idling an LVT or DVT circuit. But for longer sleep 
times the energy cost of discharging the internal dynamic nodes is 
amortized and the lower sleep leakage current yields lower overall 
energy. For LB and LB2, the cross-over point is around 2 jj,s in 
the 180 nm process for the worst case (vecl). However, the cross- 
over point in the 70 nm process is under 10 ns because active energy 
scales down faster than leakage power. 

Conclusion 

As leakage currents become more significant, the leakage currents 
themselves can be used to bias nodes into low-leakage states. When 
used to dynamically deactivate critical path circuits in projected 
70 nm process technologies, LB-Domino provides two decades re- 
duction in steady-state leakage current compared with low-Vt or 
dual-Vt domino at equal delay and noise margin. LB-Domino has 
sub-cycle deactivation and reactivation latencies, and because leak- 
age currents are used to bias the circuit, LB-Domino also has low 
transition energy overheads. Using LB-Domino to place circuits 
into a sleep state can yield net energy savings even for sleep times 
of under 10 ns. This makes dynamic fine-grain circuit deactivation 
practical, where small pieces of an active system can be powered- 
down for short periods of time to save leakage energy. 
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Figure 3: Delay and active energy consumption : 180 nm process. 
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Figure 4: Delay and active energy consumption : 70 nm process. 



110 

100 

90 

80 

! 7 ° 

g 60 

o 

o. 

o> 50 

n 

n 40 

30 

20 

10 





steady-state leakage power 



w 



\ZZ\ ved 
I I vec2 
I I vec3 



LVT 



DVT 



LB 



LB2 



Figure 5: Steady-state leakage power : 180 nm process, elk is high 
for all and sleep is asserted for LB and LB2. 
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Figure 6: Steady-state leakage power : 70 nm process, elk is high for 
all and sleep is asserted for LB and LB2. Note that y-axis is log-scale. 
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Figure 7: Cumulative sleep energy : 180 nm process. 
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Figure 8: Cumulative sleep energy : 70 nm process. 



